1 research outputs found
Overview of BioASQ 2021-MESINESP track. Evaluation of advance hierarchical classification techniques for scientific literature, patents and clinical trials
CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania,There is a pressing need to exploit recent advances in natural language processing technologies, in
particular language models and deep learning approaches, to enable improved retrieval, classification
and ultimately access to information contained in multiple, heterogeneous types of documents. This is
particularly true for the field of biomedicine and clinical research, where medical experts and scientists
need to carry out complex search queries against a variety of document collections, including literature,
patents, clinical trials or other kind of content like EHRs. Indexing documents with structured controlled
vocabularies used for semantic search engines and query expansion purposes is a critical task for enabling
sophisticated user queries and even cross-language retrieval. Due to the complexity of the medical domain
and the use of very large hierarchical indexing terminologies, implementing efficient automatic systems
to aid manual indexing is extremely difficult. This paper provides a summary of the MESINESP task
results on medical semantic indexing in Spanish (BioASQ/ CLEF 2021 Challenge). MESINESP was carried
out in direct collaboration with literature content databases and medical indexing experts using the DeCS
vocabulary, a similar resource as MeSH terms. Seven participating teams used advanced technologies
including extreme multilabel classification and deep language models to solve this challenge which can
be viewed as a multi-label classification problem. MESINESP resources, we have released a Gold Standard
collection of 243,000 documents with a total of 2179 manual annotations divided in train, development
and test subsets covering literature, patents as well as clinical trial summaries, under a cross-genre
training and data labeling scenario. Manual indexing of the evaluation subsets was carried out by three
independent experts using a specially developed indexing interface called ASIT. Additionally, we have
published a collection of large-scale automatic semantic annotations based on NER systems of these
documents with mentions of drugs/medications (170,000), symptoms (137,000), diseases (840,000) and
clinical procedures (415,000). In addition to a summary of the used technologies by the teams, this paperS